Search CORE

160 research outputs found

Ontologies and Information Extraction

Author: Nazarenko Adeline
Nédellec Claire
Publication venue
Publication date: 01/01/2005
Field of study

This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

arXiv.org e-Print Archive

HAL Descartes

HAL-Paris 13

Adapting a general parser to a sublanguage

Author: Aubin Sophie
Nazarenko Adeline
Nédellec Claire
Publication venue
Publication date: 01/01/2005
Field of study

In this paper, we propose a method to adapt a general parser (Link Parser) to sublanguages, focusing on the parsing of texts in biology. Our main proposal is the use of terminology (identication and analysis of terms) in order to reduce the complexity of the text to be parsed. Several other strategies are explored and finally combined among which text normalization, lexicon and morpho-guessing module extensions and grammar rules adaptation. We compare the parsing results before and after these adaptations

arXiv.org e-Print Archive

HAL Descartes

HAL-Paris 13

Text-mining and ontologies: new approaches to knowledge discovery of microbial diversity

Author: Bossy Robert
Chaix Estelle
Deléger Louise
Nédellec Claire
Publication venue
Publication date: 24/10/2017
Field of study

Microbiology research has access to a very large amount of public information on the habitats of microorganisms. Many areas of microbiology research uses this information, primarily in biodiversity studies. However the habitat information is expressed in unstructured natural language form, which hinders its exploitation at large-scale. It is very common for similar habitats to be described by different terms, which makes them hard to compare automatically, e.g. intestine and gut. The use of a common reference to standardize these habitat descriptions as claimed by (Ivana et al., 2010) is a necessity. We propose the ontology called OntoBiotope that we have been developing since 2010. The OntoBiotope ontology is in a formal machine-readable representation that enables indexing of information as well as conceptualization and reasoning.Comment: 5 page

arXiv.org e-Print Archive

HAL Descartes

Information extraction from bibliography for Marker Assisted Selection in wheat

Author: Bossy Robert
Golik Wiktoria
Nédellec Claire
Ranoux Marion
Sourdille Pierre
Valsamou Dialekti
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Improvement of most animal and plant species of agronomical interest in the near future has become an international stake because of the increasing demand for feeding a growing world population and to mitigate the reduction of the industrial resources. The recent advent of genomic tools contributed to improve the discovery of linkage between molecular markers and genes that are involved in the control of traits of agronomical interest such as grain number or disease resistance. This information is mostly published as scientific papers but rarely available in databases. Here, we present a method aiming at automatically extract this information from the scientific literature and relying on a knowledge model of the target information and on the WheatPhenotype ontology that we developed for this purpose. The information extraction results were evaluated and integrated into the on-line semantic search engine [i]AlvisIR WheatMarker.[/i

Crossref

HAL Clermont Université

HAL Descartes

Hal-Diderot

BioNLP Shared Task - The Bacteria Track

Author: Alphonse Erick
Bessières Philippe
Bossy Robert
Jourde Julien
Manine Alain-Pierre
Nédellec Claire
van de Guchte Maarten
Veber Philippe
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Background: We present the BioNLP 2011 Shared Task Bacteria Track, the first Information Extraction challenge entirely dedicated to bacteria. It includes three tasks that cover different levels of biological knowledge. The Bacteria Gene Renaming supporting task is aimed at extracting gene renaming and gene name synonymy in PubMed abstracts. The Bacteria Gene Interaction is a gene/protein interaction extraction task from individual sentences. The interactions have been categorized into ten different sub-types, thus giving a detailed account of genetic regulations at the molecular level. Finally, the Bacteria Biotopes task focuses on the localization and environment of bacteria mentioned in textbook articles. We describe the process of creation for the three corpora, including document acquisition and manual annotation, as well as the metrics used to evaluate the participants' submissions. Results: Three teams submitted to the Bacteria Gene Renaming task; the best team achieved an F-score of 87%. For the Bacteria Gene Interaction task, the only participant's score had reached a global F-score of 77%, although the system efficiency varies significantly from one sub-type to another. Three teams submitted to the Bacteria Biotopes task with very different approaches; the best team achieved an F-score of 45%. However, the detailed study of the participating systems efficiency reveals the strengths and weaknesses of each participating system. Conclusions: The three tasks of the Bacteria Track offer participants a chance to address a wide range of issues in Information Extraction, including entity recognition, semantic typing and coreference resolution. We found commond trends in the most efficient systems: the systematic use of syntactic dependencies and machine learning. Nevertheless, the originality of the Bacteria Biotopes task encouraged the use of interesting novel methods and techniques, such as term compositionality, scopes wider than the sentence

Crossref

Springer - Publisher Connector

PubMed Central

ProdInra